A Keyword-Set Search System for Peer-to-Peer Networks

نویسندگان

  • Omprakash D Gnawali
  • Frans Kaashoek
  • Arthur C. Smith
چکیده

The Keyword-Set Search System (KSS) is a Peer-to-Peer (P2P) keyword search system that uses a distributed inverted index. The main challenge in a distributed index and search system is finding the right scheme to partition the index across the nodes in the network. The most obvious scheme would be to partition the index by keyword. A keyword partitioned index requires that the list of index entries for each keyword in a search be retrieved, so all the lists can be joined; only a few nodes need to be contacted, but each sends a potentially large amount of data. In KSS, the index is partitioned by sets of keywords. KSS builds an inverted index that maps each set of keywords to a list of all the documents that contain the words in the keyword-set. When a user issues a query, the keywords in the query are divided into sets of keywords. The document list for each set of keywords is then fetched from the network. The lists are intersected to compute the list of matching documents. The list of index entries for each set of words is smaller than the list of entries for each word. Thus search using KSS results in a smaller query time overhead. Preliminary experiments using traces of real user queries show that the keywordset approach is more efficient than a standard inverted index in terms of communication costs for query. Insert overhead for KSS grows exponentially as the size of the keyword-set used to generate the keys for index entries. The query overhead for the target application (metadata search in a music file sharing system) is reduced to the result of the query as no intermediate lists are transferred across the network for the join operation. Given our assumption that free disk space is plenty, and queries are more frequent than insertions in P2P systems, we believe this is a good tradeoff. Thesis Supervisor: M. Frans Kaashoek Title: Professor of Computer Science and Engineering

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Genetic Programming-based trust model for P2P Networks

Abstract— Peer-to-Peer ( P2P ) systems have been the center of attention in recent years due to their advantage . Since each node in such networks can act both as a service provider and as a client , they are subject to different attacks . Therefore it is vital to manage confidence for these vulnerable environments in order to eliminate unsafe peers . This paper investigates the use of genetic ...

متن کامل

Peer-to-Peer Keyword Search Using Keyword Relationship

Decentralized and unstructured peer-to-peer (P2P) networks such as Gnutella are attractive for Internet-scale information retrieval and search systems because they require neither any centralized directory nor any centralized management of overlay network topology and data placement. However, due to this decentralized architecture, current P2P keyword search systems lack useful global knowledge...

متن کامل

Distributed Suffix Tree for Peer-to-Peer Search

Establishing an appropriate semantic overlay on Peer-to-Peer networks to obtain both semantic ability and scalability is a challenge. Current DHT-based P2P networks are limited in their ability to support semantic search. This paper proposes the DST (Distributed Suffix Tree) overlay as the intermediate layer between the DHT overlay and the semantic overlay. The DST overlay supports search of ke...

متن کامل

Design and Implementation of a Semantic Peer-to-Peer Network

Decentralized and unstructured peer-to-peer (P2P) networks such as Gnutella are attractive for large-scale information retrieval and search systems due to scalability, fault-tolerance, and self-organizing nature. This decentralized architecture, however, makes it difficult for traditional P2P networks to globally share useful semantic knowledge among nodes. As a result, traditional P2P networks...

متن کامل

Arpeggio: Metadata Indexing in a Structured Peer-to-Peer Network

Peer-to-peer networks require an efficient means for performing searches for files by metadata keywords. Unfortunately, current methods usually sacrifice either scalability or recall. Arpeggio is a peer-to-peer file-sharing network that uses the Chord lookup primitive as a basis for constructing a distributed keyword-set index, augmented with index-side filtering, to address this problem. We in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002